-
Notifications
You must be signed in to change notification settings - Fork 26
Add SGLang Connector for Prefill/Decode Disaggregation #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of hight level comments
- we're moving the sidecar to llm-d-inference-scheduler repo (targeting v0.4)
- I suspect support of additional inference server would affect additional llm-d components (perhaps require changes in IGW as well)
| vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on") | ||
| connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2 or lmcache") | ||
| vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on (also used for SGLang)") | ||
| connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2, lmcache, or sglang") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connector represents to the mechanism of transferring KV between P and D instances.
Does using sglang here represent the same concept or is it more an implementation of how the sidecar should communicate with the inferencese server? Based on connector_sglang.go it seem to be similar. Can you please confirm?
| if *connector != proxy.ConnectorNIXLV1 && *connector != proxy.ConnectorNIXLV2 && *connector != proxy.ConnectorLMCache { | ||
| logger.Info("Error: --connector must either be 'nixl', 'nixlv2' or 'lmcache'") | ||
| if *connector != proxy.ConnectorNIXLV1 && *connector != proxy.ConnectorNIXLV2 && *connector != proxy.ConnectorLMCache && *connector != proxy.ConnectorSGLang { | ||
| logger.Info("Error: --connector must either be 'nixl', 'nixlv2', 'lmcache', or 'sglang'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps its worthwhile to list options in an array (or map) and use that to generate L33, L56 and L57?
| port := flag.String("port", "8000", "the port the sidecar is listening on") | ||
| vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on") | ||
| connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2 or lmcache") | ||
| vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on (also used for SGLang)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- consider generalizing the variable name to
inferencePort? - similarly, change the description to a more generalized form?
may want to change the CLI flag to something more generic as well, but that would be a breaking change. We could accept a new flags (e.g., serving-port?), mark this as deprecated and allow the two to coexist for a while (e.g., prioritize use new flag in code, fallback to current if missing and log a deprecation warning).
|
@elevran Thanks for the review. Since this repository is deprecated, I’ll close this PR for now and resubmit it to |
|
thanks so much. |
Summary
This PR introduces a new SGLang connector that supports prefill/decode (P/D) disaggregation for the LLM-D routing sidecar. It enables concurrent prefill and decode operations through SGLang’s bootstrap mechanism.
Changes
Features
Test